Current Issue : April - June Volume : 2021 Issue Number : 2 Articles : 5 Articles
Digital watermarking technology plays a powerful role in the effective protection of digital media copyright, image authentication, image sharing, image information transmission and other fields. Driven by strong demand, digital image watermarking technology has aroused widespread research interest and has gradually developed into one of the most active research directions in information science. In this paper, we present a novel robust digital watermarking algorithm based on discrete radon transform tight frame in finite-set (FDRT). FDRT of the zero mean image is a tight frame, the frame boundary A = B = 1, the dual of the frame is itself. The decomposition and reconstruction of the FDRT tight frame will not cause the phenomenon of image distortion. The embedding of hidden watermark is to add a weak signal to the strong background of the original image. Watermark extraction is to effectively identify the embedded weak signal. The feasibility of the watermarking algorithm is analyzed from two aspects of information hiding and robustness. We select the independent Gaussian random vector as the watermark series, and the peak signal-to-noise ratio (PSNR) as the visual degradation criterion of the watermark image. Basing the FDRT compact stand dual operator, we derived the relationship among the strength parameter, square sum of watermark series, the PSNR. Using Checkmark system, the simulation results show that the algorithm is robust enough to some very important image processing attacks such as lossy compression, MAP, filtering, segmentation, edge enhancement, jitter, quadratic modulation and general geometric attack (scaling, rotation, shearing), etc....
Recognition of human emotion from facial expression is affected by distortions of pictorial quality and facial pose, which is often ignored by traditional video emotion recognition methods. On the other hand, context information can also provide different degrees of extra clues, which can further improve the recognition accuracy. In this paper, we first build a video dataset with seven categories of human emotion, named human emotion in the video (HEIV). With the HEIV dataset, we trained a context-aware attention network (CAAN) to recognize human emotion. The network consists of two subnetworks to process both face and context information. Features from facial expression and context clues are fused to represent the emotion of video frames, which will be then passed through an attention network and generate emotion scores. Then, the emotion features of all frames will be aggregated according to their emotional score. Experimental results show that our proposed method is effective on HEIV dataset....
This paper proposes and evaluates the LFrWF, a novel lifting-based architecture to compute the discrete wavelet transform (DWT) of images using the fractional wavelet filter (FrWF). In order to reduce the memory requirement of the proposed architecture, only one image line is read into a buffer at a time. Aside from an LFrWF version with multipliers, i.e., the LFrWFm, we develop a multiplier-less LFrWF version, i.e., the LFrWFml, which reduces the critical path delay (CPD) to the delay Ta of an adder. The proposed LFrWFm and LFrWFml architectures are compared in terms of the required adders, multipliers, memory, and critical path delay with state-of-the-art DWTarchitectures. Moreover, the proposed LFrWFm and LFrWFml architectures, along with the state-of-the-art FrWF architectures (with multipliers (FrWFm) and without multipliers (FrWFml)) are compared through implementation on the same FPGA board. The LFrWFm requires 22% less look-up tables (LUT), 34% less flip-flops (FF), and 50% less compute cycles (CC) and consumes 65% less energy than the FrWFm. Also, the proposed LFrWFml architecture requires 50% less CC and consumes 43% less energy than the FrWFml. Thus, the proposed LFrWFm and LFrWFml architectures appear suitable for computing the DWT of images on wearable sensors....
In this work, we propose a method to transform a speaker’s speech information into a target character’s talking video; the method could make the mouth shape synchronization, expression, and body posture more realistic in the synthesized speaker video. This is a challenging task because changes of mouth shape and posture are coupled with audio semantic information. The model training is difficult to converge, and the model effect is unstable in complex scenes. Existing speech-driven speaker methods cannot solve this problem well. The method proposed in this paper first generates the sequence of key points of the speaker’s face and body postures from the audio signal in real time and then visualizes these key points as a series of two-dimensional skeleton images. Subsequently, we generate the final real speaker video through the video generation network. We take a random sampling of audio clips, encode audio contents and temporal correlations using a more effective network structure, and optimize and iterate network outputs using differential loss and attitude perception loss, so as to obtain a smoother pose key-point sequence and better performance. In addition, by inserting a specified action frame into the synthesized human pose sequence window, action poses of the synthesized speaker are enriched, making the synthesis effect more realistic and natural. Then, the final speaker video is generated by the obtained gesture key points through the video generation network. In order to generate realistic and highresolution pose detail videos, we insert a local attention mechanism into the key point network of the generated pose sequence and give higher attention to the local details of the characters through spatial weight masks. In order to verify the effectiveness of the proposed method, we used the objective evaluation index NME and user subjective evaluation methods, respectively. Experiment results showed that our method could vividly use audio contentsto generate corresponding speaker videos, and its lip-matching accuracy and expression postures are better than those of previous work. Compared with existing methods in the NME index and user subjective evaluation, our method showed better results....
Digital videos have an important and increasing presence in student learning. They play a key role especially in subjects with high mathematical content, such as physics. However, creating videos is a time-consuming activity for teachers, who are usually not experts in video creation. Therefore, it is important to know which kinds of videos are perceived as more useful by students and why. In this paper we analyze students’ perception of videos in an introductory physics course of engineering with over 200 first year students in a 100% online university, Universitat Oberta de Catalunya (UOC). Students had 142 videos available of several types. We followed a qualitative methodology from a ground theory perspective and performed semi-structured interviews. Results show that students found videos as the most valued resource, although they considered that videos cannot substitute text documents. Students valued human elements and found them in videos where the hands of the professor appear. Finally, students consumed videos according to the course schedule, visualized the whole video the first time, and consumed it later according to further deliveries and exams. The main contributions of this paper were analyzing the perception of students from a qualitative perspective in an introductory course of physics in engineering, obtaining the main elements that make videos useful for students and showing that videos with hands are valued by students....
Loading....